(c) 2020 Julian Wagner. This work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license. Code for non-parametric bootstrapping of linear regression slopes from work by Justin Bois.
import numpy as np
import pandas as pd
import bokeh
import bokeh.io
import bokeh.plotting
import bebi103
import bokeh_catplot
import re
import glob
bokeh.io.output_notebook()
Liometopum occidentale is an ecologically dominant ant of oak forests in the Southwest United States and Mexico. It has numerous symbionts, including several species of beetle, a cricket, as well as hemipterans. Pamillia behrensii is one associate of the L. occidentale. P. behrensii has a metathoracic gland with classic defensive compounds including benzoquinones, a potent irritant. To test whether the bug uses it's gland secretion for defense, we performed behavioral experiments coupled with solid phase micro-extraction and mass spectrometry to assess whether aggressive interactions with ants elicit gland usage. Here, we measured whether the amount of bug movement over the course of a behavior trial (a proxy for attempts to escape/avoid a threatening stimulus) correlates with amount of gland secretion deployed.

Pamillia behrensii. The metathoracic gland can be seen on the underside of the bug, between second and third set of legs; it is the white curved structure.
To assess gland deployment, we employed an infrared illuminated arena, and placed a bug with either 5 ants, 5 ants with glued mandibles (to reduce their aggression/potency against the bug), or alone. We chilled the animals on ice and loaded them into the arena. The trials were run for 30 minutes with SPME fiber deployed, before the SPME fiber was injected into a GCMS. To start the analysis, we begin by loading up metadata on the behavioral videos into a dataframe.
metadata = pd.read_csv("./pamillia_dlc_analysis/pamillia_spme_metadata.csv", index_col=0)
metadata['blob_file'] = [re.sub("DLC.*", "_dist_im_an.csv", re.sub("./pamillia_dlc_analysis/", "C:/Users/jwagne/git/pamillia_only-julian_wagner-2020-08-05/videos/", f)) for f in metadata['DLC_file']]
metadata.head()
Peaks from the GCMS run were manually integrated for the four identified gland compounds to get total counts, a measure for total amount of each compound from the gland secretion. In order to look at the behavior of the bug, namely its movement over time, we used DeepLabCut (DLC). Markers for the head, two places on the thorax and two on the abdomen were manually annotated on 1033 frames from a variety of of videos. Some images were annotated as refinement of the trained network after identification of poorly labeled frames. The behavioral videos and the results of the annotation with DLC look as follows:

We can take a look at the chemical data to see amounts of gland compounds deployed in our different treatment conditions (5 ants [5_locc], 5 glued-mandible ants [5_locc_glued], or bug alone [alone]). We plot both raw $\mathrm{count}$ data as well as $log(\mathrm{count})$ data due to the large scale of the axis.
metadata['log heptan-2-one'] = np.log(metadata['heptan-2-one'])
metadata['log heptyl acetate'] = np.log(metadata['heptyl acetate'])
metadata['log 2-ethyl-1-4-benzoquinone'] = np.log(metadata['2-ethyl-1-4-benzoquinone'])
metadata['log 2-ethyl-1-4-hydroquinone'] = np.log(metadata['2-ethyl-1-4-hydroquinone'])
plots = []
for chem in ['heptan-2-one', 'heptyl acetate', '2-ethyl-1-4-benzoquinone', '2-ethyl-1-4-hydroquinone',
'log heptan-2-one', 'log heptyl acetate', 'log 2-ethyl-1-4-benzoquinone', 'log 2-ethyl-1-4-hydroquinone',]:
p = bokeh_catplot.strip(
data=metadata.loc[metadata['SPME_total_time']>1600],
cats='interactor',
val=chem,
jitter=True,
horizontal=True,
marker_kwargs=dict(alpha=0.5, size=8),plot_height=150,palette=bokeh.palettes.viridis(3), plot_width=500,title='Counts from integrated GCMS peak',
)
plots.append(p)
bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=2))
We can see quite a bit of variability between trials, but a generally very high quantity of gland compound deployed when the bug interacted with ants, whereas a small/GCMS baseline amount present when the bug was alone in the arena. This indicates the bug actively uses the gland, rather than the compounds continuously diffusing from the gland. The stark difference in the distribution of amount of chemical used when the big faces an enemy vs is alone is more clearly seen with an empirical cumulative distribution of the data.
plots = []
for chem in ['heptan-2-one', 'heptyl acetate', '2-ethyl-1-4-benzoquinone', '2-ethyl-1-4-hydroquinone']:
p = bokeh_catplot.ecdf(
data=metadata.loc[metadata['SPME_total_time']>1600],
cats='interactor',
val=chem,
marker_kwargs=dict(alpha=1), style='staircase', plot_width=500,palette=bokeh.palettes.viridis(3),title='Counts from integrated GCMS peak'
)
p.legend.location = 'bottom_right'
plots.append(p)
bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=2))
The distributions look pretty similar for the ant and glued ant conditions, and compound amount near to zero for the big alone. We can now look at whether the level of movement of the bug, a proxy for how much it engages in an escape response or how agitated it is, correlates with how much chemical it secreted from its gland during the trial. We define some helper functions to read in the DLC data and filter it.
def read_files(DLC_file, time_file, network, start, stop, SPME_time_buffer=20):
df_pos = pd.read_hdf(DLC_file)
df_pos_time = pd.read_csv(time_file)
df_pos[(network, 'all', 't')] = df_pos_time['time']
df_pos_med = df_pos.rolling(5).median()
df_pos_part_med_x = np.median(df_pos_med.dropna()[[(network, 'bh', 'x'), (network, 'bt1', 'x'), (network, 'bt2', 'x'), (network, 'ba1', 'x'), (network, 'ba2', 'x')]], axis=1)
df_pos_part_med_y = np.median(df_pos_med.dropna()[[(network, 'bh', 'y'), (network, 'bt1', 'y'), (network, 'bt2', 'y'), (network, 'ba1', 'y'), (network, 'ba2', 'y')]], axis=1)
df_tidy = pd.DataFrame({'t':df_pos_med.dropna()[(network, 'all', 't')], 'x':df_pos_part_med_x, 'y':df_pos_part_med_y})
if stop == 'full':
df_tidy = df_tidy.loc[df_tidy['t']>start]
else:
df_tidy = df_tidy.loc[(df_tidy['t']>start) & (df_tidy['t']<stop)]
return(df_tidy)
def load_DLC_medians(metadata, ind, SPME_time_buffer=20):
DLC_file = metadata.iloc[ind]['DLC_file']
time_file = metadata.iloc[ind]['timestamp_file']
if ';' in metadata.iloc[ind]['DLC_file']:
file1, file2 = re.match('([^;]+); *([^;]+)', DLC_file).groups()
file1_time, file2_time = re.match('([^;]+); *([^;]+)', time_file).groups()
network = re.search('(DLC.*).h5', file1)[1]
start, stop = (metadata.iloc[ind]['SPME_start_time'], metadata.iloc[ind]['SPME_stop_time'])
df1 = read_files(file1, file1_time, network, start, 'full')
df2 = read_files(file2, file2_time, network, 0, stop)
df2['t'] = df2['t'] + df1['t'].max()
df_tidy = pd.concat([df1, df2])
else:
network = re.search('(DLC.*).h5', DLC_file)[1]
start, stop = (metadata.iloc[ind]['SPME_start_time'], metadata.iloc[ind]['SPME_stop_time'])
df_tidy = read_files(DLC_file, time_file, network, start, stop)
return(df_tidy)
With these functions ready, we can iterate through the the data and read in the position information of the bug in the arena over time. Note that there were some preliminary experiments that were shorter than 30 mins. With SPME, if the trial is not run for a comparable length of time, the chemical amounts cannot be directly compared. Hence, these short trials were excluded from the following analysis.
cols = {'5_locc':bokeh.palettes.viridis(3)[0], '5_locc_glued':bokeh.palettes.viridis(3)[1], 'alone':bokeh.palettes.viridis(3)[2]}
df_t_list, colors, interactors, IDs, BQs, pixPerMms, trialOrder, dists = ([], [], [], [], [], [], [], [])
for i in range(len(metadata)):
if metadata.iloc[i]['SPME_total_time'] < 1500:
continue
df_t_list.append(load_DLC_medians(metadata, i))
colors.append(cols[metadata.iloc[i]['interactor']])
interactors.append(metadata.iloc[i]['interactor'])
IDs.append(metadata.iloc[i]['ID'])
BQs.append(metadata.iloc[i]['2-ethyl-1-4-benzoquinone'])
pixPerMms.append(metadata.iloc[i]['pixPerMm'])
trialOrder.append(metadata.iloc[i]['order'])
try:
df_dists = pd.read_csv(metadata.iloc[i]['blob_file'])
dists.append(np.sum(df_dists.loc[df_dists['all_dists']< 5000].rolling(10).median()['all_dists'].dropna().values < 100)*(1/25)*(1/60))
except:
dists.append(0)
We can now check on what the positional data looks like. For this, we will plot the x-position of a bug over time. Each line is a different experiment, with the bug ID listed on the left. The y-axis represents the position in the arena over time, so movement up or down in that axis represents movement across the arena. The positional values are shifted and scaled to show all the trials.
p = bokeh.plotting.figure(plot_width=1500, x_axis_label='time (min)')
p.yaxis.visible=False
for i, df_t in enumerate(df_t_list):
p.line((df_t['t']-df_t['t'].min())/60, 1.5*IDs[i]+trialOrder[i]/4+df_t['x']/6000, color=colors[i], legend_label=interactors[i], line_width=3)
if trialOrder[i] == 1 or IDs[i]==1:
mytext = bokeh.models.Label(x=-0.8, y=1.5*IDs[i]+trialOrder[i]/4, text=str(IDs[i]), text_align='center', text_font_size='18pt')
p.add_layout(mytext)
mytext = bokeh.models.Label(x=-0.8, y=1.5*IDs[i]+trialOrder[i]/4, text='ID#', text_align='center', text_font_size='12pt')
p.add_layout(mytext)
p.legend.location='top_right'
bokeh.io.show(p)